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Using  Air  Traffic  Control  Taskload  Measures  and  Communication 
Events  to  Predict  Subjective  Workload 


Introduction 

Sensitive  and  valid  workload  measures  are  needed 
for  en  route  air  traffic  control  (ATC)  to  identify 
potential  negative  effects  on  controllers  of  using  new 
forms  of  automation  or  ATC  procedures  (Wickens, 
Mavor,  &  McGee,  1997)  and  to  ensure  that  intended 
benefits  for  controller  productivity  have  been  achieved. 
ATC  workload  can  be  influenced  by  many  factors, 
including  numbers  and  configurations  of  aircraft 
moving  through  a  sector,  the  activities  the  controller 
performs  to  control  those  aircraft,  and  the  controller’s 
reaction  to  the  air  traffic  situation. 

Measures  of  ATC  workload  are  typically  based  on 
controllers’  subjective  ratings,  made  either  while  con¬ 
trolling  air  traffic  or  just  afterwards.  One  problem 
with  using  workload  ratings  obtained  while  control¬ 
ling  traffic  is  that  their  values  may  be  influenced  by 
the  effort  required  to  generate  and  record  the  ratings. 
On  the  other  hand,  workload  ratings  provided  after 
the  controller  finishes  controlling  traffic  may  be  influ¬ 
enced  by  extraneous  factors  such  as  remembering  only 
events  that  occurred  early  or  late  in  the  traffic  sample 
(e.g.,  due  to  proactive  or  retroactive  inhibition). 

Research  is  being  conducted  to  develop  objective 
workload  estimates  that  could  replace  subjective 
workload  ratings  by  computing  variables  from  rou¬ 
tinely  recorded  ATC  data  (Buckley,  DeBaryshe, 
Hitchner,  &  Kohn,  1983;  Galushka,  Frederick, 
Mogford,  &  Krois,  1995;  Mills,  Pfleiderer,  &  Man¬ 
ning.,  2002;  Stager,  Ho,  &  Garbutt,  2001).  These 
taskload  measures  usually  describe  both  aircraft  and 
controller  activities.  It  is  desirable,  from  a  research 
perspective,  to  use  objective  taskload  measures  rather 
than  subjective  workload  ratings  because  it  is  often 
easier  and  less  expensive  to  obtain  access  to  routinely- 
recorded  ATC  data  than  it  is  to  have  air  traffic 
controllers  participate  in  research  simulations.  An¬ 
other  reason  for  using  objective  taskload  measures 
instead  of  subjective  measures  is  that  they  are  not 
influenced  by  rater  errors  such  as  leniency/ severity 
errors  or  errors  of  central  tendency  (Landry,  1989). 
Finally,  computing  objective  measures  from  recorded 
data  will  not  interfere  with  controllers’  activities  (thus, 
not  affecting  their  .perceived  workload). 


Although  using  objective  measures  has  some  obvi¬ 
ous  benefits,  the  argument  has  also  been  made  that 
they  do  not  provide  a  complete  representation  of  ATC 
workload.  While  these  measures  capture  variations  in 
ATC  activity,  they  cannot  capture  a  controller’s  per¬ 
sonal  reaction  to  the  air  traffic  situation  (Stein,  1998). 
Stein  contends  that  controllers’  individual  differences 
influence  their  perception  of  the  effects  of  a  particular 
taskload.  Thus,  subjective  workload  ratings  are  af¬ 
fected  by  a  component  that  cannot  be  derived  simply 
by  analyzing  recorded  data.  However,  other  research 
has  found  significant  correlations  between  objective 
taskload  and  subjective  workload  measures  (Stein, 
1985;  Manning,  Mills,  Fox,  Pfleiderer,  &  Mogilka, 
2001),  suggesting  that  using  taskload  measures  alone 
may  provide  sufficient  information  to  evaluate  the 
effects  of  new  systems. 

Communications  between  pilots  and  controllers 
and  between  controllers  and  other  controllers  are  also 
recorded  routinely  and  so  may  be  included  among  a 
set  of  objective  taskload  measures.  Communication 
events  can  include  counts  of  the  number  of  com¬ 
munications,  the  timing  with  which  these  events 
occur,  and  the  content  of  the  communications. 

Pilot-controller  communications  are  thought  to  be 
related  to  ATC  workload  because  complicated  com¬ 
munications  can  increase  workload  (Morrow  and 
Rodvold,  1998).  Bruce  (1993)  found  that  both  traffic 
volume  and  traffic  complexity  (both  frequently  used 
indicators  of  ATC  taskload)  were  significantly  related 
to  the  number  of  pilot/controller  communications. 
Cardosi  (1993)  examined  numbers  and  timing  of 
communication  events  as  a  function  of  message  type 
in  a  descriptive  study  that  analyzed  timing  of  voice 
communications  related  to  traffic  avoidance.  As  part 
of  this  study,  she  used  numbers  of  communications 
per  hour  to  classify  time  periods  as  high-  or  low- 
workload. 

Corker,  Gore,  Fleming,  and  Lane  (2000)  used 
communication  time  as  an  indicator  of  workload 
against  which  to  assess  alternative  free  flight  condi¬ 
tions.  Porterfield  (1997)  examined  the  relationship 
between  the  amount  of  time  spent  communicating 
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and  on-line  workload  ratings.  He  found  a  correlation 
of  r  =  .88  and  concluded  that  controller  communica¬ 
tion  duration  is  a  valid  measure  of  workload. 

Besides  indicating  the  amount  of  activity,  one 
advantage  of  using  communication  events  as  an  indi¬ 
cator  of  workload  is  that  their  content  and  associated 
affective  components  may  indicate  the  amount  of 
effort  the  controller  experienced  at  the  time  the  event 
occurred.  Thus,  these  measures  may  contribute  at 
least  part  of  the  subjective  component  of  workload 
that  Stein  ( 1 998)  argues  is  not  accounted  for  by  other 
taskload  measures.  Moreover,  analyzing  recorded  voice 
communications  does  not  interfere  with  the 
controller's  task. 

On  the  other  hand,  there  are  some  disadvantages 
associated  with  the  use  of  communications  measures. 
First,  determining  the  number  and  duration  of  com¬ 
munication  events  requires  a  considerable  amount  of 
time  and  manual  labor,  and  coding  their  content  and 
affect  requires  even  more  effort.  Thus,  the  use  of 
communication  events  would  seem  to  be  inconsistent 
with  the  goal  of  obtaining  an  easily-computed  set  of 
taskload  measures,  unless  they  add  significantly  to  the 
prediction  ofsubjective  workload.  Second,  the  timing 
of  recorded  communication  events  does  not  account 
for  all  communication  activity  because  some  exchanges 
(e.g.,  radar  [R]  controller  to  data  [D]  controller  com¬ 
munications)  are  not  recorded.  Thus,  analysis  of  re¬ 
corded  communications  will  provide,  at  best,  a 
lower-bound  estimate  for  subjective  workload. 

Previous  research  suggests  that  certain  communica¬ 
tions  measures,  such  as  number  and  duration,  are  asso¬ 
ciated  with  workload.  However,  a  related  question  that 
must  be  answered  is  whether  distinguishing  between 
pilot/controller  and  controller/controller  communica¬ 
tions  or  coding  the  content  of  communications  will  add 
a  unique  component  to  the  prediction  of  subjective 
workload  over  and  above  that  contributed  by  other  types 
of  objectively  measured  controller  and  aircraft:  activities. 
If  counts  and  durations  of  communication  events  mea¬ 
sure  something  different  than  ATC  taskload  measures, 
as  evidenced  by  low  correlations  between  the  variables, 
and  they  contribute  a  unique  component  to  the  predic¬ 
tion  of  subjective  workload,  then  it  would  be  useful  to 
expend  the  effort  required  to  obtain  and  analyze  them.  If, 
on  the  other  hand,  communication  events  are  highly 
correlated  with  other  objectively- measured  ATC  activi¬ 
ties  and  subjective  workload,  then  they  will  contribute 
little  unique  variance  to  the  prediction  of  subjective 
workload,  and  expending  the  effort  required  to  extract 
them  would  be  of  little  value.  Given  the  results  of 
research  that  suggest  that  communication  events  are 
related  to  taskload,  we  expect  that  the  communication 


events  measured  here  will  be  so  highly  correlated  with 
our  taskload  measures  that  they  will  not  make  a  unique 
contribution  to  the  prediction  of  subjective  workload. 

The  purpose  of  this  study  was  to  examine  the 
relationship  between  communication  events,  subjec¬ 
tive  workload,  and  objective  taskload  measures.  The 
communication  events  analyzed  were  total  number  of 
communications,  total  time  spent  communicating, 
average  time  spent  for  an  individual  communication, 
and  communication  content.  The  number  of  com¬ 
munication  events  and  time  spent  communicating 
were  analyzed  separately  for  each  speaker  (controller, 
other).  The  number  and  timing  of  a  controller’s 
communications  should  be  related  to  subjective 
workload,  but  having  to  attend  to  other  speakers 
could  also  be  a  component  of  workload. 

We  proposed  several  hypotheses  about  the  relation¬ 
ships  between  these  measures.  First,  we  expected  that 
the  total  number  and  duration  of  communication 
events  would  be  significantly  related  to  busyness — as 
measured  both  by  subjective  workload  and  objective 
taskload  measures.  As  the  traffic  situation  gets  busier, 
more  communication  events  should  occur,  and  more 
time  should  be  spent  communicating,  both  by  the 
controller  and  other  speakers. 

Second,  we  expected  that  the  average  time  for  an 
individual  communication  event  should  be  negatively 
related  to  both  workload  and  taskload.  As  the  traffic 
situation  gets  busier,  the  amount  of  time  spent  on  a  single 
communication  should  decline.  The  time  spent  on  an 
individual  communication  event  is  likely  to  be  related  to 
the  identity  of  the  speaker;  that  is,  controllers  are  likely 
to  reduce  the  amount  of  time  they  spend  on  an  indi¬ 
vidual  communication  while  other  speakers  are  unlikely 
to  be  as  affected  by  activity  occurring  in  the  sector. 

Third,  we  expected  that  the  content  of  communi¬ 
cation  events  may  be  related  to  sector  activity.  As  the 
situation  gets  busier,  there  should  be  more  clearances 
issued,  readbacks,  and  pilot  requests.  However,  the 
number  of  advisories  or  unrelated  remarks  may  not 
change.  We  also  expected  that  the  number  of  clear¬ 
ances  issued  to  pilots  should  be  related  to  subjective 
workload,  while  radio  frequency  changes  issued  should 
be  unrelated. 

Fourth,  if  communication  events  are  significantly 
related  to  sector  activity,  we  expected  that  they  would 
not  contribute  uniquely  to  the  prediction  of  subjec¬ 
tive  workload,  over  and  above  the  contribution  of  the 
taskload  measures.  Thus,  we  expected  that  taskload 
measures  alone  would  account  for  most  of  the  vari¬ 
ance  in  a  set  of  subjective  workload  ratings  and  this 
prediction  would  not  improve  by  adding  communica¬ 
tion  measures  to  the  set  of  predictors. 


2 


If  any  measures  derived  from  communication  events 
do  indeed  add  a  unique  component  to  the  prediction 
of  subjective  workload,  then  it  would  be  worth  taking 
the  time  to  analyze  the  transmissions.  On  the  other 
hand,  if  they  do  not  add  a  unique  component  to  the 
prediction  of  subjective  workload,  it  would  not  be 
necessary  to  analyze  them. 

Method 

This  study  examined  statistical  relationships  be¬ 
tween  communication  events,  objective  taskload  mea¬ 
sures,  and  subjective  workload  measures.  The 
communication  events  and  taskload  measures  were 
obtained  from  samples  of  routinely- recorded  ATC 
data.  The  workload  measures  were  provided  by  sub¬ 
ject  matter  experts  (SMEs)  who  observed  graphical 
displays  of  the  same  ATC  data  samples  (hereafter 
called  “traffic  samples”)  and  rated  the  workload  they 
thought  the  R  controller  responsible  for  the  sector  had 
experienced.  Each  component  of  the  study  is  dis¬ 
cussed  in  more  detail  below. 

Traffic  Samples 

System  Analysis  Report  (SAR)  data  and  voice  com¬ 
munication  tapes  were  obtained  for  12  traffic  samples 
recorded  during  January,  1999,  at  four  sectors  (sec¬ 
tors  14,  30,  52,  and  54)  in  the  Kansas  City  Air  Route 
Traffic  Control  Center  (ARTCC).  The  ATC  data 
were  extracted  by  the  Data  Analysis  and  Reduction 
Tool  (DART;  Federal  Aviation  Administration,  1993) 
and  the  National  Track  Analysis  Program  (NTAP; 
Federal  Aviation  Administration,  1991).  The  result¬ 
ing  files  were  processed  both  by  the  Systematic  Air 
Traffic  Operations  Research  Initiative  (SATORI; 
Rodgers  &  Duke,  1993)  and  Performance  and  Objec¬ 
tive  Workload  Evaluation  Research  (POWER;  Mills, 
Pfleiderer,  &  Manning,  2002)  software  programs. 
SATORI  synchronizes  extracted  data  from  DART 
and  NTAP  files  with  tapes  containing  the  R  controller’s 
voice  communications,  using  the  time  code  common 
to  both  data  sources,  while  POWER  uses  data  from  a 
subset  of  the  DART  files  to  compute  taskload  mea¬ 
sures.  Three  traffic  samples  were  re-created  for  each 
sector.  One  traffic  sample  (used  to  train  the  SMEs) 
was  eight  minutes  long.  The  two  remaining  experi¬ 
mental  traffic  samples  were  both  20  minutes  long. 

Participants 

Participants  were  16  en  route  air  traffic  control 
instructors  from  the  FAA  Academy  in  Oklahoma 
City,  OK.  All  had  formerly  been  fully-certified  con¬ 
trollers  at  en  route  centers.  Two  participants  had 


controlled  traffic  at  some  of  the  Kansas  City  Center 
sectors  included  in  the  traffic  samples,  though  none 
had  worked  all  the  sectors  included  in  the  study. 

Sector  training  materials 

Computerized  training  sessions  were  provided  that 
described  the  characteristics  and  applicable  proce¬ 
dures  for  each  sector.  Participants  examined  copies  of 
sector  maps  and  the  sector  binder  (which  contained 
additional  information  about  the  sector).  Participants 
could  also  examine  flight  plan  information  (derived 
from  recorded  flight  strip  messages)  for  each  aircraft 
controlled  by  the  sector  during  the  traffic  sample. 

Subjective  workload 

Participants  provided  a  subjective  workload  assess¬ 
ment  using  the  Air  Traffic  Workload  Input  Tech¬ 
nique  (ATWIT;  Stein,  1985).  The  ATWIT  measured 
mental  workload  in  “real-time”  by  presenting  audi¬ 
tory  and  visual  cues  that  prompted  a  controller  to 
press  one  of  seven  buttons  within  a  specified  amount 
of  time  to  indicate  the  amount  of  mental  workload 
experienced  at  that  moment.  In  this  study,  instead  of 
rating  their  own  workload,  the  participants  entered 
ATWIT  ratings  to  indicate  the  amount  of  mental 
workload  they  thought  the  R  controller  experienced 
in  reaction  to  the  taskload  that  occurred  during  the 
traffic  sample.  The  participants  were  prompted  every 
four  minutes  during  each  traffic  sample  to  provide 
ATWIT  ratings. 

Objective  taskload  measures 

The  objective  taskload  measures  used  in  this  study 
were  derived  from  the  POWER  software  (Mills, 
Pfleiderer,  &  Manning,  2002).  The  POWER  mea¬ 
sures  included  information  about  the  number  of  con¬ 
trolled  aircraft;  handoffs  made  and  accepted;  altitude 
changes;  controller  data  entries  and  data  entry  errors; 
variations  in  aircraft  headings,  speeds,  and  altitudes; 
and  the  average  time  aircraft  were  under  control.  In 
all,  23  POWER  measures  were  utilized  in  this  study. 

Communication  events 

Communication  events  were  obtained  from  voice 
tapes  associated  with  the  traffic  samples.  The  measures 
analyzed  in  this  study  were  the  total  number  of  commu¬ 
nications,  total  time  spent  communicating  during  a 
traffic  sample,  and  time  required  for  individual  commu¬ 
nication  events  (for  all  speakers,  and  analyzed  separately 
for  the  controller  and  other  speakers). 
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The  communication  events  were  also  categorized 
by  content.  The  content  categories  were  based  on  a  set 
derived  by  Prinzo,  Britton  Hendrix  (1995),  and 
consisted  of  1)  Address,  2)  Courtesy,  3)  Clearance,  4) 
Advisory/Remark,  5)  Request,  6)  Readback/Acknowl- 
edgment,  and  7)  Non-codable.  The  clearance  cat' 
egory  was  then  divided  into  two  sub-categories, 
Instructional  Clearances  and  Frequency  Changes,  to 
distinguish  between  clearances  instructing  an  aircraft 
to  proceed  and  more  routine  instructions  for  the  pilot 
to  change  the  radio  frequency  when  leaving  the  sector. 
Communications  were  not  otherwise  separated  into 
specific  message  types  (e.g.,  altitude  or  heading  clear¬ 
ance)  or  coded  as  errors  (e.g.,  transposed  numbers/ 
letters)  in  order  to  retain  sufficient  numbers  for  analysis. 

Procedure 

Participants  reviewed  a  description  of  the  study, 
completed  a  consent  and  a  biographical  information 
form,  then  reviewed  instructions  for  making  the 
ATWIT  workload  assessments,  as  well  as  two  other 
types  of  post-scenario  subjective  workload  assess¬ 
ments  not  analyzed  in  this  study.  For  each  of  the  four 
sectors,  participants  1)  reviewed  training  materials,  2) 
observed  one  8-minute  training  traffic  sample,  and  3) 
observed  two  20-minute  experimental  traffic  samples. 
To  ensure  continuity,  all  traffic  samples  for  a  sector 
were  shown  concurrently  as  a  block.  The  order  of 
observing  the  four  blocks  of  traffic  samples  (corre¬ 
sponding  to  the  four  sectors)  was  counter-balanced,  as 
was  the  order  of  presentation  of  the  two  experimental 
traffic  samples  within  each  block. 

During  each  traffic  sample,  participants  recorded 
any  mistakes  using  a  behavioral  observation  form  (see 
Manning  et  al.,  2001,  for  more  details).  The  ATWIT 
aural  prompt  occurred  every  four  minutes,  and  par¬ 
ticipants  responded  by  entering  a  number  between  1 
and  7  on  a  keypad.  At  the  end  of  each  traffic  sample, 
participants  completed  the  other  subjective  workload 
assessments,  summed  errors  they  had  recorded,  then 
completed  an  over-the-shoulder  performance  rating 
form  (see  Manning  et  al.,  2001,  for  more  details). 
Completing  the  training  process  and  observing  the  three 
traffic  samples  for  each  sector  required  about  1  Vi  hours. 

Data  processing 

Communication  events  during  each  traffic  sample 
were  transcribed.  Message  contents  of  each  transmis¬ 
sion  were  categorized,  along  with  the  identity  of  the 
speaker  (i.e.,  controller,  pilot,  other  speaker)  and  the 
start  and  stop  times.  These  data  were  used  to  compute 
the  total  number  of  communications  and  time  spent 


communicating  during  each  4-minute  period,  as  well 
as  the  mean  time  for  individual  communication  events 
and  their  contents. 

The  23  POWER  measures  were  computed  for  the 
five  4-minute  segments  included  in  each  experimental 
traffic  sample.  The  other  two  workload  assessments 
were  not  analyzed  in  this  study  because  they  were  only 
obtained  at  the  end  of  each  traffic  sample;  thus,  only 
eight  observations  (one  for  each  traffic  sample)  were 
available  for  analysis. 

The  ATWIT  ratings  were  averaged  across  partici¬ 
pants  for  each  4-minute  segment  included  in  each  ex¬ 
perimental  traffic  sample,  resulting  in  40  observations. 

Results 

Subjective  workload.  ATWIT  ratings,  when  aver¬ 
aged  across  4-minute  time  periods  within  the  eight 
traffic  samples,  ranged  between  2.01  and  3.54.  The 
mean  rating  across  the  eight  traffic  samples 

was  2.76  {SD  -  .59).  This  value  is  significantly  lower 
than  4  (r(39)  =  -13.2,  p  <  .001),  the  mid-point  of  the 
7-point  workload  scale,  suggesting  that  participants 
thought  that  workload  was  generally  low  during  the 
traffic  samples. 

Communication  events.  Nine  hundred  ninety-nine 
communication  events  (or,  on  the  average,  about  125 
per  traffic  sample)  were  recorded  during  the  eight 
traffic  samples.  Four  hundred  seventy-one  of  these 
(47%)  were  made  by  a  controller,  and  528  (53%)  were 
made  by  another  speaker  (pilot  or  other  controller.) 
The  average  number  of  communication  events  for  a  4- 
minute  period  was  25.0  {SD  =  10.0).  Controllers 
made,  on  the  average,  1 1.8  {SD  =  4.8)  of  the  commu¬ 
nications,  while  other  speakers  made  13.2  {SD  =  5.4). 

On  the  average,  the  total  amount  of  time  spent 
communicating  during  a  4-minute  period  was  69.18 
seconds  {SD  =  25.0),  or  about  29%  of  the  240  avail¬ 
able  seconds.  Controllers  spent,  on  the  average,  38.3 
seconds  {SD  =  15.3)  speaking  during  each  4-minute 
period,  while  others  spoke  for  an  average  of  30.9 
seconds  {SD  =  12.1). 

The  average  duration  of  a  single  communication 
event  was  2.86  seconds  {SD  -  0.63).  Single  commu¬ 
nication  events  for  controllers  lasted,  on  the  average, 
3.38  seconds  {SD  =  0.95),  while  single  communica¬ 
tion  events  for  other  speakers  lasted,  on  the  average, 
2.41  seconds  {SD  =  0.60). 

The  average  number  of  communication  events  by 
content  is  shown  in  Table  1.  Because  each  transmis¬ 
sion  may  have  included  more  than  one  topic  of  con¬ 
versation,  each  communication  event  may  include 
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Table  1.  Descriptive  statistics  for  communication  event  content  categories. 

Content  of  communication  events _ Mean _ SD 

Address  15.1  5.4 


Courtesy 

Advisory 

Request 

Readback 


4.4 

2.6 

5.2 

3.2 

2.6 

2.3 

9.9 

4.6 

Instructional  clearances 


3.8  2.1 


Frequency  changes 


more  than  one  content  category.  Thus,  the  number  of 
times  a  content  category  was  addressed  in  a  4-minute 
time  period  was  greater  than  the  number  of  commu¬ 
nication  events  that  occurred. 

Addresses  occurred  most  frequently,  on  the  average, 
about  15  times  in  a  4-minute  period.  Readbacks  oc¬ 
curred  about  10  times  per  period.  Requests,  instruc¬ 
tional  clearances,  and  frequency  changes  occurred  least 
often.  Non-codable  communications  were  not  reported 
here  and  were  excluded  from  all  subsequent  analyses. 

Table  2  shows  inter-correlations  between  the  com¬ 
munication  events  computed  for  the  4-minute  peri¬ 
ods.  Total  times  and  numbers  of  communications,  for 
both  controllers  and  all  other  speakers,  were  highly 
correlated  with  each  other.  The  average  times  for 
individual  communication  events  were  significantly 
correlated  with  each  other  and  with  the  total  number 
and  timing  of  communication  events  (with  a  negative 
valence),  but  the  correlations  were  not  very  high.  The 
number  of  Addresses  was  significantly  correlated  with 
all  other  content  categories,  but  that  was  not  true  of 
any  other  content  category.  Readbacks  had  high  cor¬ 
relations  with  Addresses  and  Advisories,  and  were 
related  to  all  other  content  categories  except  Fre¬ 
quency  Changes.  Frequency  Changes  were  only  sig¬ 
nificantly  related  to  Addresses  and  Courtesies. 

While  interesting,  the  pattern  of  correlations  was 
difficult  to  interpret,  so  two  Principal  Components 
Analyses  (PCAs)  with  Varimax  rotation  were  conducted 
to  identify  a  smaller  set  of  components  that  would 
describe  the  relationships  between  the  communication 
events  more  concisely.  The  first  PCA  included  only  the 
variables  describing  the  number  and  duration  of  com¬ 
munication  events.  The  second  PCA  included  only  the 
content  categories  for  the  communication  events.  We 
decided  that  because  counts  and  timing  of  communica¬ 
tion  events  were  sufficiently  different  from  their  content, 
separate  PCAs  were  warranted. 


The  first  PCA,  which  included  variables  describing 
the  number  and  duration  of  communication  events, 
produced  two  components  with  eigenvalues  greater 
than  1.  The  two  components  accounted  for  81.6%  of 
the  variability  in  the  data  set.  The  rotated  component 
matrix  is  shown  in  Table  3.  The  entries  in  the  table  are 
correlations  between  each  communication  measure 
and  the  two  components  derived  from  the  analysis. 
For  ease  of  interpretation,  correlations  less  than  .3 
were  excluded  from  the  table. 

The  number  and  duration  of  all  communications 
that  occurred  during  the  4-minute  period  had  high 
correlations  with  component  1  and,  thus,  it  was 
labeled  All  Communications  Number  and  Duration, 
The  mean  time  for  an  individual  communication 
event,  both  for  controllers  and  other  speakers,  had  the 
highest  correlations  with  component  2,  although  to¬ 
tal  communication  time  for  controllers  was  positively 
correlated  and  number  of  communications  by  other 
speakers  was  negatively  correlated.  Thus,  component 
2  was  labeled  Individual  Communication  Duration. 

The  second  PCA,  which  included  variables  de¬ 
scribing  the  communication  content  categories,  pro¬ 
duced  three  components  with  eigenvalues  greater  than 
1.  These  components  accounted  for  84.2%  of  the 
variability  in  the  data  set.  The  rotated  component 
matrix  is  shown  in  Table  4.  For  ease  of  interpretation, 
correlations  less  than  .3  were  excluded  from  the  table. 

Both  Requests  and  Advisories  had  the  highest  cor¬ 
relation  with  component  1,  although  Readbacks  and 
Addresses  were  also  correlated.  Thus,  component  1 
was  labeled  Requests  and  Advisories.  Frequency  Changes 
and  Courtesies  had  the  highest  correlation  with  com¬ 
ponent  2,  although  Addresses  and  Readbacks  were 
also  correlated.  Thus,  component  2  was  labeled  Fre¬ 
quency  ChangesI Courtesies.  Instructional  clearances  had 
the  highest  correlation  with  component  3,  although 
addresses,  advisories,  and  readbacks  were  also  correlated. 
Thus,  component  3  was  labeled  Instructional  Clearances. 


Table  2.  Intercorrelations  of  communication  events. 

N  N  comms  Total  Total  Mean  Mean  Addr  Crtsy  Advsry  Reqst  Rdbck  Instruct  Freq 

comms  -  Other  comm  comm  time  time  clmce  change 
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Table  3.  Rotated  component  matrix  for  2  principal  components  representing  number 
and  duration  of  communication  events. 


Communication  number  and 

Comp  1:  All 

Comp  2:  Individual 

duration  measures 

Communications 

Communication 

Number  and  Duration 

Duration 

Total  N  comnls  -  controller 

.95 

Total  N  comms  -  other  speaker 

.90 

-.35 

Total  comm  time  -  controller 

.83 

.38 

Total  comm  time  -  other  speaker 

.93 

Mean  time  individual  comm  - 
controller 

.87 

Mean  time  individual  comm  -  other 
speaker 

.74 

*Correlations  less  than  .3  are  not  displayed. 


Table  4.  Rotated  component  matrix  for  3  components  representing  communication  content 
categories. 


Communication  content 

measures 

Comp  1 :  Requests/ 
Advisories 

Comp  2:  Frequency 
Changes/  Courtesies 

Comp  3:  Instructional 
Clearances 

Address 

.50 

.53 

.53 

Courtesy 

.84 

Advisory 

.81 

.38 

Request 

.93 

Readback 

.60 

.40 

.60 

Instructional  clearance 

.92 

Frequency  change 

.96 

*Correlations  less  than  .3  are  not  displayed. 
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Taskload  measures.  Table  5  shows  descriptive  statis¬ 
tics  for  the  23  POWER  measures  averaged  across  the 
4-minute  periods  in  each  traffic  sample.  Some  of  the 
POWER  measures  (primarily  certain  kinds  of  data 
entries,  such  as  handoffs  and  altitude  changes)  oc¬ 
curred  several  times  during  the  4-minute  periods. 
However,  many  of  the  other  data  entries  (e.g., 
pointouts,  data  block  offsets,  Distance  Reference  In¬ 
dicators  [DRIs,  also  known  as  J-rings],  track  reroutes) 
and  the  conflict  alerts  (both  displayed  and  suppressed) 
occurred  very  infrequently.  In  fact,  many  of  the  vari¬ 
ables  occurred  in  fewer  than  30%  of  the  time  segments, 
resulting  in  near-zero  means  and  corresponding  stan¬ 
dard  deviations  that  were  greater  than  the  means.  Subse¬ 
quent  analyses  excluded  these  infrequent  variables. 

Moreover,  two  variables  (R  controller  data  entries 
and  D  controller  data  entries)  were  a  compilation  of 
all  subsets  of  specific  data  entries  (such  as  Data  Block 
Offsets,  Route  Displays,  Rand  D  controller  Pointouts, 
DRIs  requested  and  deleted,  and  altitude  changes).  If 
all  specific  data  entries  were  summed,  they  would 
total  the  values  of  the  R  and  D  controller  data  entries. 


It  is  not  appropriate  to  analyze  both  individual  mea¬ 
sures  and  a  variable  that  comprises  their  sum,  so  for 
the  purpose  of  this  study,  the  individual  measures 
were  excluded  from  further  analysis.  However,  the 
average  time  to  accept  a  handoff  and  average  time 
until  initiated  HOs  are  accepted  were  retained  for 
analysis  because  they  are  independent  of  the  number 
of  handoffs  made  and  accepted. 

To  reduce  the  number  of  POWER  measures  by 
grouping  similar  variables,  correlations  between  the 
measures  were  first  computed.  These  are  shown  in 
Table  6.  Significant  correlations  were  observed  be¬ 
tween  a  number  of  the  variables.  However,  visual 
examination  of  the  correlations  did  not  provide  a 
systematic  method  for  interpreting  the  relationships 
between  variables.  A  PCA,  with  Varimax  rotation, 
was  conducted  to  identify  a  smaller  set  of  components 
that  would  describe  the  relationships  between  the 
POWER  measures  more  concisely.  Four  components 
were  produced  with  eigenvalues  greater  than  1  that 
accounted  for  71.2%  of  the  variance  in  the  data. 


Table  5.  Descriptive  statistics  for  POWER  measures  obtained  at  4-minute  intervals. 


Descriptive  statistics 

Power  Measures 

Mean 

SD 

Total  N  aircraft  controlled 

7.20 

2.73 

Max  aircraft  controlled  simultaneously 

5.48 

2.35 

Average  time  aircraft  under  control 

158.35 

34.38 

Avg  Heading  variation 

1.06 

0.86 

Avg  Speed  variation 

4.22 

2.46 

Avg  Altitude  variation 

2.00 

1.48 

*  Total  N  altitude  changes 

3.50 

2.20 

*  Total  N  handoffs  accepted 

1.15 

1.12 

Avg  time  to  accept  handoff 

25.91 

27.58 

*  Total  N  handoffs  initiated 

1.98 

1.29 

Avg  time  until  initiated  HOs  are  accepted 

41.00 

45.45 

N  Radar  controller  data  entries 

11.35 

5.54 

N  Radar  controller  data  entry  errors 

0.23 

0.58 

N  Data  controller  data  entries 

1.93 

2.04 

N  Data  controller  data  entry  errors 

0.08 

0.27 

*  N  Route  displays 

0.40 

0.84 

*  N  Radar  controller  pointouts 

0.08 

0.27 

*  N  Data  controller  pointouts 

0.08 

0.47 

*  N  data  block  offsets 

0.15 

0.43 

*  Total  N  CAs  displayed 

0.08 

0.27 

*  Number  of  CA  suppression  entries 

0.05 

0.22 

*  N  DRIs  requested 

0.05 

0.22 

*  N  DRIs  deleted 

0.03 

0.16 

Note:  *  indicates  variables  excluded  from  further  analysis. 


8 


9 


The  results  of  this  analysis  should  be  interpreted 
with  some  caution  because  1)  only  4-minute  time 
segments  were  analyzed,  and  2)  only  40  observations 
from  4  sectors  were  available  for  analysis.  Subsequent 
analyses  using  larger  data  sets  should  be  conducted  to 
obtain  more  stable  results.  However,  the  primary 
purpose  of  conducting  this  analysis  was  to  derive  a 
smaller  number  of  variables  to  be  used  in  later  analy¬ 
ses.  Table  7  contains  the  rotated  component  matrix 
for  the  4  components.  For  ease  of  interpretation, 
correlations  less  than  .3  were  excluded. 

Component  1  was  primarily  related  to  the  number 
of  aircraft  controlled,  those  controlled  simultaneously, 
and  R  controller  data  entries.  To  a  lesser  extent,  the 
component  was  also  related  to  the  time  to  accept 
handoffs  and  control  duration.  Component  1  was 
labeled  Activity  because  higher  values  for  these  mea¬ 
sures  were  associated  with  the  presence  of  more  air¬ 
craft  in  a  sector  that  required  more  controller  activity. 

Component  2  was  related  to  variation  in  heading, 
speed,  and  altitude,  and,  to  a  lesser  extent,  control 
duration.  Component  2  was  labeled  Low  Altitude 
Maneuvers  because  these  measures  were  related  to 
aircraft  maneuvering  consistent  with  arrivals  at  and 
departures  from  low  altitude  sectors  surrounding  the 
St.  Louis  Lambert  Airport.  This  interpretation  is 
supported  by  a  comparison  of  average  heading,  speed, 
and  altitude  variability,  which  were  all  significantly 
higher  in  low  altitude  sectors  than  in  high  altitude  sectors 
(/(38)  =  2.82,  5.75,  and 3.49,  respectively; /><  .01  in  all 
three  cases). 


Component  3  was  primarily  related  to  R  controller 
data  entry  errors  and  the  time  required  to  accept 
initiated  handoffs.  To  a  lesser  extent,  it  was  also 
negatively  related  to  time  to  accept  handoffs.  These 
conditions  were  consistent  with  busy  R  controllers 
making  more  data  entry  errors,  having  to  attend  to 
whether  the  next  controller  had  accepted  handoffs 
that  he/she  had  initiated,  and  taking  longer  to  accept 
aircraft  handed  off  to  his/her  sector.  Thus,  Compo¬ 
nent  3  was  called  Overload, 

Component  4  was  primarily  related  to  D  controller 
data  entries  and  D  controller  data  entry  errors.  To  a 
lesser  extent,  the  component  was  also  related  to  lower 
altitude  variation  and  lower  control  duration.  While 
the  number  of  D  controller  data  entries  and  errors 
were  not  related  to  the  number  of  aircraft  in  the  sector, 
the  presence  of  aircraft  that  changed  altitude  less 
frequently  and  were  in  the  sector  for  a  shorter  period 
of  time  was  related  to  a  higher  number  of  D  errors. 
Thus,  Component  4  was  called  D  Activity. 

Prediction  of  mental  workload 

Correlations.  Table  8  shows  correlations  between 
the  ATWIT  subjective  workload  rating,  the  four  ob¬ 
jective  workload  components,  the  two  components 
related  to  number  and  duration  of  communication 
events,  and  the  three  communication  content  compo¬ 
nents.  By  definition,  the  principal  components  are 
unrelated,  so  their  inter-correlations  are  0.  The  mean 
ATWIT  rating  had  a  correlation  of  .80  (p  <  .01)  with 
Activity^  a  correlation  of  .62  {p  <  .01)  with  All 


Table  7.  Rotated  component  matrix  for  4  components  representing  reduced  set  of 
POWER  measures. 


Power  Measures 

Comp  1: 
Activity 

Comp  2: 
Low  Alt 
Maneuvers 

Comp  3: 
Overload 

Comp  4: 

D  activity 

Max  aircraft  controlled 

.94 

simultaneously 

Total  N  aircraft  controlled 

.94 

Avg  Heading  variation 

.81 

Avg  Speed  variation 

.92 

Avg  Altitude  variation 

.73 

-.33 

Avg  time  to  accept  handoff 

.40 

-.40 

Avg  time  until  initiated  HOs  are 

.80 

accepted 

Avg  time  aircraft  under  control 

.44 

.54 

-.39 

N  Radar  controller  data  enuies 

.76 

N  Radar  conuoller  data  entry 

.87 

errors 

N  Data  controller  data  entries 

.76 

N  Data  controller  data  entry 

.77 

errors 

♦Correlations  less  than  .3  are  not  displayed. 
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Communications  Number  and  Duration^  a  correlation 
of  .65  ip  <  .01)  with  Instructional  Clearances^  and  a 
correlation  of  .36  {p  <  .05)  with  Individual  Communi¬ 
cation  Duration. 

Activity  was  significantly  correlated  with  All  Com¬ 
munications  Number  and  Duration  (r  =  .63,  />  <  .0 1 ), 
Clearances  (  r  =  .52,  /  <  .01),  and  with  Frequency 
Changes/Courtesies  (r  =  .36,  p  <  .05).  The  Overload, 
Low  Altitude  Maneuvers,  and  D  Activity  components 
were  not  significantly  correlated  with  any  of  the  other 
variables.  All  Communications  Number  and  Duration 
was  significantly  correlated  with  all  three  content 
components:  For  RequestsI Advisories,  r-  .65,  p  <  .01; 
for  Frequency  ChangesI Courtesies,  r=  .39, /^  <  .05;  and 
for  Instructional  Clearances,  r=.54,/><.01.  Individual 
Communication  Duration  was  not  significantly  corre¬ 
lated  with  any  of  the  content  components. 

Regression.  A  set  of  analyses  was  performed  to  assess 
the  effectiveness  of  alternative  multiple  regression 
models  in  predicting  the  subjective  ATWIT  ratings. 


Table  9  shows  the  results  of  these  analyses.  Row  1 
shows  the  multiple  correlation  of  the  full  model 
containing  the  4  taskload,  2  communication  number 
and  duration,  and  3  communication  context  compo¬ 
nents  as  predictors.  The  multiple  correlation  of  the 
full  model  with  the  ATWIT  ratings  was  R  =  .88, 
accounting  for  about  77%  of  the  variance  in  the 
subjective  workload  ratings.  Succeeding  lines  show 
multiple  correlations  between  alternative  (reduced) 
regression  models  containing  fewer  than  the  total 
number  of  predictors.  The  column  containing  Ffor 
the  test  of  IF  change  compares  the  relative  effective¬ 
ness  of  a  reduced  model  with  the  effectiveness  of  the 
full  model  in  predicting  the  ATWIT  rating.  If  the 
probability  is  greater  than  .05  that  the  change  in  IF 
between  the  two  models  is  significantly  different  from 
0,  then  the  reduced  model  is  considered  to  be  as 
effective  as  the  full  model  in  predicting  subjective 
workload.  On  the  other  hand,  if  the  probability  is  less 
than  or  equal  to  .05  that  the  change  in  between  the 


Table  9.  Results  of  analyses  comparing  alternative  multiple  regression  models  predicting  ATWIT 
ratings. 

Regression  model  R  P  F  F  for  test  df  p 

change  of  R^ 

_ _ _ _ _ _ change _ 

1 .  Full  model  containing  all  taskload,  0.88  0.77  N/A  N/A 

communication  number  and  duration,  and 

communication  context  components _ 


Reduced  models  based  on  Taskload  components 


2.  Model  containing  all  taskload 
components 

0.82 

0.67 

0.11 

2.80 

5,  30 

.034 

3.  Model  containing  only  the  Activity 
component 

0.80 

0.65 

0.13 

2.13 

8,  30 

.064* 

4.  Model  containing  all  taskload 
components  except  Activity 

0.15 

0.02 

0.75 

16.66 

6,  30 

.000 

Reduced  models  based  on  communications  components 


5.  Model  containing  five  communication 
components 

0.78 

0.60 

0.17 

5.67 

4,  30 

.002 

6.  Model  containing  two  communication 
number  and  duration  components 

0.72 

0.52 

0.25 

4.78 

7,  30 

.001 

7.  Model  containing  three  communication 
context  components 

0.69 

0.48 

0.41 

6.49 

6,  30 

.000 

8.  Model  containing  A//  Communications 
Number  and  Duration  and  Instructional 
Clearances  components 

0.73 

0.53 

0.25 

4.67 

7,  30 

.001 

9.  Model  containing  Instructional 
Clearances  component 

0.65 

0.43 

0.35 

5.81 

8,  30 

.000 

Reduced  model  combining  taskload  and  communications  components 

10,  Model  containing  Activity,  All 
Communications  Number  and  Duration, 
and  Instructional  Clearances  components 

0.85  0.72 

0.05 

1.11 

6, 30  .378* 

1 1 .  Model  containing  Activity  and 
Instructional  Clearances  components 

0.85  0.72 

0.05 

1.02 

7, 30  .439* 

*  Indicates  reduced  models  that  predicted  ATWIT  ratings  as  well  as  the  full  model. 
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two  models  is  significantly  different  from  0,  then  the 
reduced  model  is  not  considered  to  be  as  effective  as 
the  full  model  in  predicting  subjective  workload.  The 
goal  is  to  identify  a  reduced  model  that  contains  as  few 
predictors  as  possible,  but  accounts  for  a  high  enough 
percentage  of  the  variance  in  the  dependent  variable  to 
be  considered  equivalent  to  the  full  model. 

The  analysis  of  10  reduced  models  is  shown  in 
Table  9  (see  rows  2-11).  The  first  group  of  analyses 
(rows  2-4)  compared  reduced  models  containing  dif¬ 
ferent  combinations  of  taskload  components  with  the 
full  model.  The  second  group  of  analyses  (rows  5-9) 
compared  reduced  models  containing  different  com¬ 
binations  of  communication  components  with  the 
full  model.  The  final  group  of  analyses  (rows  9-11) 
compared  reduced  models  containing  combinations 
of  both  taskload  and  communications  components 
with  the  full  model. 

As  an  example,  row  2  compared  a  reduced  model 
containing  all  the  taskload  components  with  the  full 
model.  The  model  containing  all  the  taskload  compo¬ 
nents  had  an  /?^of  .67,  compared  with  the  full  model’s 

of  .77.  The  A  computed  to  assess  the  change  of 
.11  had  a  value  of  2.80,  and  the  probability  was  .034 
that  the  change  in  R^  was  greater  than  0.  Thus,  the 
reduced  model  containing  all  the  taskload  compo¬ 
nents  was  significantly  different  than  the  full  model  in 
predicting  ATWIT  ratings  and,  thus,  was  not  as 
effective  as  the  full  model. 

A  second  example  is  shown  on  line  3,  which  compared 
a  reduced  model  containing  only  the  Activity  taskload 
component  with  the  full  model.  The  model  containing 
only  the  component  had  an  i?^of  .65,  compared 

with  the  full  model’s  R?  of  .77.  The  /^computed  to  assess 
the  ^  change  of  .13  had  a  value  of  2.13,  and  the 
probability  was  .064  that  the  change  in  ^  was  greater 
than  0.  Thus,  using  an  alpha  level  of  .05,  the  reduced 
model  containing  only  Activity  component  predicted 
ATWIT  ratings  as  well  as  the  full  model. 

A  third  example  is  shown  on  line  5.  The  reduced 
model  containing  all  the  taskload  components  except 
the  Activity  component  had  an  R?  of  .02,  compared 
with  the  full  model’s  R?  of  .77.  The  F  computed  to 
compare  the  R?  change  of  .75  had  a  value  of  1 6.66,  and 
the  probability  was  less  than  .0001  that  the  change  in 
R^  was  greater  than  0.  Thus,  the  reduced  model 
containing  all  the  Taskload  components  except  the 
Activity  component  did  not  predict  the  ATWIT  rat¬ 
ings  as  well  as  the  full  model. 

Four  of  the  reduced  models  shown  in  Table  9  (one 
containing  the  Activity  component  alone,  one  con¬ 
taining  the  Activity^  All  Communications  Number  and 
Duration,  and  Instructional  Clearances  components. 


and  one  containing  the  Activity  and  Instructional 
Clearances  comipontnis)  predicted  ATWIT  ratings  as 
well  as  the  full  model.  Thus,  for  a  reduced  model  to  be 
equivalent  to  the  full  model,  it  must  contain  the 
Activity  component.  None  of  the  reduced  models 
containing  any  combination  of  the  communications 
components  were  equivalent  to  the  full  model  unless 
they  contained  the  Activity  component. 

Discussion  and  Conclusions 

We  formed  several  hypotheses  about  the  relationships 
between  the  communications  variables,  objective  taskload 
variables,  and  subjective  workload.  These  were: 

1 .  T  oral  number  and  duration  of  communication  events 
will  have  a  significant  and  positive  correlation  with 
workload  and  taskload. 

2.  Average  time  for  an  individual  communication  event 
should  be  negatively  related  to  workload  and  taskload. 

3.  The  content  of  communication  events  may  be  related 
to  sector  activity. 

4.  Communication  events  will  not  provide  a  unique 
contribution  to  the  prediction  ofsubjective  workload, 
over  and  above  the  prediction  contributed  by  the 
taskload  measures. 

Before  conducting  the  analyses,  we  derived  sets  of 
independent  principal  components  to  reduce  the  num¬ 
ber  of  variables  analyzed  to  a  manageable  set,  given  the 
number  of  observations  in  the  data  set.  Thus,  the 
analyses  that  tested  the  hypotheses  were  based  on 
components  consisting  of  a  weighted  combination  of 
communication  and  taskload  variables  instead  of  the 
individual  variables.  Examination  of  Table  8  shows 
that  four  components.  Activity,  All  Communications 
Number  and  Duration,  Individual  Communications 
Duration,  and  Instructional  Clearances,  had  signifi¬ 
cant  correlations  with  the  ATWIT  ratings.  Thus, 
certain  aspects  of  taskload,  the  number  and  duration 
of  communication  events,  and  the  content  of  commu¬ 
nications  are  all  related  to  subjective  workload. 

Table  8  also  shows  that  the  All  Communications 
Number  and  Duration,  Frequency  Changes/Courtesies, 
and  Instructional  Clearances  comipontnts  were  signifi¬ 
cantly  correlated  with  xhc  Activity  component.  Thus, 
communication  activity  is  related  to  taskload,  espe¬ 
cially  clearances  involving  instructions  to  proceed. 

Our  prediction  about  the  duration  of  individual 
communications  was  found  to  be  only  partially  accu¬ 
rate.  While  the  Individual  Communications  Duration 
component  was  significantly  related  to  the  ATWIT 
rating,  it  was  not  significantly  correlated  with  any  of 
the  taskload  components.  Moreover,  the  principal 
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components  analysis  did  not  produce  different  com- 
ponents  for  different  speakers,  suggesting  that  the 
identity  of  the  speaker  who  generated  the  communi¬ 
cation  events  was  not  important. 

We  did,  however,  find  that  the  content  of  communi¬ 
cations  was  significantly  related  to  cenain  types  of  sector 
activity.  Instructional  Clearances  and  Frequency  Changes! 
Courtesies  were  significantly  correlated  with  Activity, 

Because  we  expected  communications  variables  to 
overlap  extensively  with  the  taskload  variables,  we 
hypothesized  that  variables  measuring  communica¬ 
tion  events  would  not  contribute  uniquely  to  the 
prediction  of  subjective  workload,  over  and  above  the 
prediction  contributed  by  the  taskload  measures.  Table 
9  compared  the  effectiveness  of  a  number  of  reduced 
regression  models  containing  different  combinations 
of  taskload  and  communications  variables  in  predict¬ 
ing  subjective  workload,  as  compared  with  the  effec¬ 
tiveness  of  the  full  model  containing  all  the  variables. 
The  full  model  accounted  for  77%  of  the  variance  in 
subjective  workload  while  the  reduced  models  ac¬ 
counted  for  anywhere  from  2%  to  72%  of  the  vari¬ 
ance.  Three  reduced  models  were  as  effective 
(statistically)  as  the  full  model  in  predicting  subjective 
workload.  The  reduced  model  containing  Activity 
accounted  for  65%  of  the  variance  in  subjective 
workload.  The  reduced  model  containing mV/ All 
Communications  Number  and  Duration  and  Instruc¬ 
tional  Clearances  ^ccoxxnttA  for  72%  of  the  variance  in 
subjective  workload.  The  reduced  model  containing 
Activity^  All  Communications  Number  and  Duration^ 
and  Instructional  Clearances  also  accounted  for  72% 
of  the  variance  in  subjective  workload. 

A  model  containing  all  the  taskload  components 
except predicted  subjective  workload  very  poorly, 
as  compared  with  the  effectiveness  of  the  full  model. 
Furthermore,  none  of  the  models  containing  only  a 
combination  of  communications  variables  predicted 
subjeaive  workload  as  well  as  the  full  model.  For  ex¬ 
ample,  a  reduced  model  containing  all  communications 
components  accounted  for  only  60%  of  the  variance  in 
subjective  workload,  a  model  containing  the  two  Com¬ 
munications  number  and  duration  components  ac¬ 
counted  for  only  52%  of  the  variance,  and  a  model 
containing  the  three  Communications  context  compo¬ 
nents  accounted  for  only  48%  of  the  variance. 

An  interesting  finding  from  this  analysis  was  that 
Activity  must  be  present  in  order  for  a  reduced  regres¬ 
sion  model  to  predict  ATWIT  ratings  as  well  as  the 
full  model.  This  result  suggests  that  variables  whose 
values  as  a  function  of  increased  air  traffic  activity 
(such  as  the  number  of  aircraft,  data  entries,  control 


duration,  time  to  accept  handoffs,  etc.)  have  an 
important  effect  on  the  controllers’  perception  of 
workload. 

The  question  that  must  be  answered  is  whether  the 
inclusion  of  any  communications  measures  added  a 
unique  component  to  the  prediction  of  subjective 
workload  over  and  above  the  contribution  of  taskload. 
According  to  the  analysis,  Activity  alone  was  statistically 
equivalent  to  the  full  model  accounting  for  65%  of  the 
variance  in  subjective  workload,.  However,  adding  the 
Instructional  Clearances  component  produced  a  reduced 
model  that  contained  only  two  variables  and  accounted 
for  72%  of  the  variance  in  subjective  workload.  While 
Activity  alone  seems  to  be  a  good  predictor  of  subjective 
workload,  the  combination  ol Activity  ^6.  Instructional 
Clearances  is  slightly  better. 

Thus,  these  data  suggest  that  those  who  only  have 
access  to  SAR  files  will  be  able  to  derive  a  very  good 
estimate  of  subjective  workload  using  controller  ac¬ 
tivity  data.  However,  those  who  have  access  to  both 
SAR  files  and  recordings  of  communication  events 
and  want  to  invest  the  time  required  to  analyze  the 
transcripts  may  be  able  to  obtain  a  better  estimate  of 
subjective  workload.  The  question  is  whether  the 
information  gained  is  worth  the  additional  time  in¬ 
vestment.  And  while  it  appears  that  it  is  not  necessary 
to  analyze  voice  communications  data  to  assess  con¬ 
troller  workload  adequately,  the  analysis  of  communi¬ 
cations  data  is  often  valuable  for  other  reasons. 

The  constraints  associated  with  this  study  should 
be  considered  when  interpreting  these  results.  First, 
the  limited  selection  of  sectors  (only  four,  all  at  one 
center,  and  all  surrounding  a  busy  airport)  and  traffic 
samples  (two  per  sector)  limit  the  ability  to  generalize 
these  results  to  other  sector  types,  traffic  situations, 
and  facilities.  Second,  the  number  of  observations 
included  in  the  analysis  limited  our  confidence  in  the 
results.  Third,  SMEs  provided  subjective  ratings  of 
the  workload  they  thought  other  controllers  were 
experiencing  instead  of  rating  their  own  workload.  If 
those  who  worked  the  traffic  had  rated  their  own 
workload,  the  results  might  have  been  different. 
Fourth,  we  assumed  that  the  ATWIT  was  the  most 
appropriate  method  for  measuring  subjective 
workload.  If  other  workload  measures  were  obtained, 
such  as  the  NASA  TLX  (Hart  &  Staveland,  1988)  or 
other  physiological  methods,  the  results  might  have 
been  different.  These  other  methods  may  measure 
different  aspects  of  workload  (because  the  TLX  is  a 
post-hoc  method  obtained  only  once,  after  the  traffic 
sample,  and  physiological  measures  may  be  influ¬ 
enced  by  factors  other  than  subjective  workload.) 
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Fifth,  the  workload  experienced  during  all  the  traffic 
samples  was  fairly  low,  as  assessed  by  the  SMEs. 
Perhaps  the  effectiveness  of  communications  mea¬ 
sures  would  have  been  more  pronounced  if  a  higher 
workload  had  been  experienced. 

Even  if  the  controller/pilot  communications  vari¬ 
ables  had  been  found  to  provide  a  larger  contribution 
to  the  prediction  of  subjective  workload,  this  relation¬ 
ship  might  be  expected  to  change  soon.  Controller/ 
Pilot  Data  Link  Communications  (CPDLC),  will 
transmit  some  pilot/controller  communications  via  a 
digital  channel,  thus  increasing  the  visual  processing 
and  reducing  the  auditory  processing  of  communica¬ 
tions.  It  has  been  proposed  that  using  CPDLC  will 
reduce  controller  workload,  but  more  likely  it  will 
only  change  the  distribution  of  workload  from  both 
visual  and  aural  to  a  primarily  visual  modality.  Add¬ 
ing  visual  tasks  to  an  already  extensive  set  of  tasks 
currently  performed  by  controllers  might  increase 
overall  workload  more  than  would  be  compensated 
for  by  reducing  the  number  of  voice  communications 
to  which  the  controller  must  attend.  However,  re¬ 
gardless  of  the  effect  on  workload  of  increasing  the 
visual  component  of  a  controller’s  activity,  the 
workload  associated  with  verbal  communications 
should  be  significantly  reduced  when  most  are  trans¬ 
ferred  to  another  information  source. 
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